Advanced Regression Assignment A

Zach Wolpe
WLPZAC001

01 June 2020


## Loading required package: foreach
## Loading required package: iterators
## Package 'GA' version 3.2
## Type 'citation("GA")' for citing this R package in publications.
## 
## Attaching package: 'GA'
## The following object is masked from 'package:utils':
## 
##     de
## Loading required package: nlme
## This is mgcv 1.8-31. For overview type 'help("mgcv-package")'.
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Generate the data

The data is generated on a unit square, where \(x\) & \(z\) \(100\) equally spaced points between \(0-1\) & \(y=f(x,z)\) is generate as a function of \(x\) & \(z\):

Noise is then added to the data:

\[e \sim N(0,0.1^2)\]

Here we visualize both the true & the noisy function.


Comparison

We want to compare two modeling techniques, by their ability to approximate the true functional form: - \(16\) evenly spaced knots to construct on thin spline basis - Construct a full rank thin spline basis & use eigenvalue deconstruction to produce a \(16\) rank basis

Random Sample

\(10000\) points are generated to fit the true functional form, however the models are trained on \(100\) random samples from the the \(10000\) available points.

Evenly spaced Knots & Radial Basis Approach

Choose 16 evenly spaced knots, construct a thin-plate spline basis, and fit this to the data (unpenalized).

Radial Basis Fuctions:

\[h_j(x) = ||x-x_j||^2 log||x-x_j||\]

if \(u\) is the euclidean distance between \(2\) points, this reduces to:

\[r(u) = u^2 log(u)\]

Knot location are generated by: - placing \(4\) evenly spaced knots on each axes - using each combination as a point on the \(2\) dim plane - resulting in \(16\) evenly spaced locations on the \(2\)-d plane

## 
## Call:
## lm(formula = y ~ basis)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24264 -0.07866  0.01055  0.09217  0.29197 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.57204    0.58669  -2.680 0.008889 ** 
## basis1      -0.04213    0.99089  -0.043 0.966187    
## basis2      -0.56989    1.47945  -0.385 0.701073    
## basis3      -4.33984    1.44482  -3.004 0.003525 ** 
## basis4       2.82785    0.90703   3.118 0.002505 ** 
## basis5      -0.90146    1.31515  -0.685 0.494972    
## basis6       5.48267    1.38931   3.946 0.000165 ***
## basis7      -3.84454    1.25885  -3.054 0.003035 ** 
## basis8      -5.86426    1.18636  -4.943 3.95e-06 ***
## basis9      -5.12834    1.37387  -3.733 0.000346 ***
## basis10     -7.75936    1.50941  -5.141 1.79e-06 ***
## basis11      2.51623    1.16112   2.167 0.033095 *  
## basis12      1.36215    1.43546   0.949 0.345411    
## basis13      3.17519    1.07056   2.966 0.003940 ** 
## basis14     -0.07966    1.59512  -0.050 0.960290    
## basis15     -4.39715    1.40297  -3.134 0.002383 ** 
## basis16      0.90766    1.01041   0.898 0.371616    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1231 on 83 degrees of freedom
## Multiple R-squared:  0.9426, Adjusted R-squared:  0.9315 
## F-statistic: 85.14 on 16 and 83 DF,  p-value: < 2.2e-16

The results are of the evenly spaced knots radial basis spline do capture the general trend/curve in the data. The double-bump structure is captured to some degree.

\(16\) evenly spaced Radial basis fit captures the true functional form in a meaningful, accurate, way. The general trend is acheived & two bumps do emerge (as in the true function). Reduced Rank thin plate splines may offer an alternative to intelligently select knot bases locations (by variance explained) to better capture the data.


Reduced Rank Thin Plate Spline Approach

As an alternative to the aforementioned tecnique, a full rank thin plate spline basis is computed & thereafter the rank is reduced by selecting only the desired \(k=16\) basis functions that explain the maximum variance - computed via eigenvalue decompositition.

The implementation, adapted from [1], is given as follows:

Once matrices \(E\) & \(T\) are computed, the algorithm is straight forward.

\(T\) contains basis functions for constant and linear terms: \(M= {m+d-1 \choose m}\) functions are linearly independent polynomials spanning the space of polynomials with degree less than \(m\) (pg. 97).

\(E\) contains radial basis functions associated with knots, initiated as a full rank matrix \((n \times n)\) thereafter the eigenvalue decomposition is used to select the \(K\) basis vectors that account for the most variation.

Once the bases & penalty matrix are computed, the model is fit using a standard penalized least squares apprach, resulting in a parameter estimate:

\[ \hat{\beta} = (X'X+\lambda P)^{-1}X'y \]

if the model is penalized (by \(S\)) or:

\[ \hat{\beta} = (X'X)^{-1}X'y \]

If the penalty is ignored.

Hyperparameters

This algorithm requires a few hyperparameters, the initial chosen values are: \(d=2\): dimensions of the surface (\(d\) covariates) \(m=2\): order of differentiation in wiggliness penalty \(k=16\): rank of basis matrix \(M=3\)


Reduced Rank Thin Plate Splines Implementation

It’s not clear what specific parameter configuration will fit the best Reduced Rank Think Plate Spline (RR-TPS), so I wrote a function that fits the above algorithm but allows one to tweak a few parameters: - Penalized fit or Unpenalized fit - \(\lambda\) to be used if the \(S\) penality is implemented - Eigen value decomposition or singular value decomposition to compute the basis matrices

The EVD or SVD parameter option was added, although EVD was implemented here.

Fit 1

Compute a RR-TPS using EVD, where T constitutes polynomials up until the \(1^st\) degree. The model is fit using unpenalized least squares.

## 
## Attaching package: 'MASS'
## The following object is masked from 'package:plotly':
## 
##     select
## Loading required package: future

The model fits the data in a very simplistic mannar - capturing overall structure but no detail (producing 1 large bump as apposed to the true known functional form). The data appears to flair up one one side - perhaps a consequence of the particular random sample. The fit is undoubtedly worse than the standard 16-evenly spaced knots.

Fit 2

The fit may be improved by tweeking the hyper-parameters. Now, instead of fitting un unpenalized model, we fit a penalized least squares to the data. \(\lambda\) is arbitrarily chosen to be \(1\).

It’s evident that the penalization reduces some of the violent movements in the fitted curve - & visually appears to give a more generalizable fit.

Fit 3

The model maybe improved further by allowing for more degrees of freedom. The \(T\) matrix was previously specified to be a consisting of polynomias up until the \(1^st\) degree, now lets allow polynomials up until the \(2^{st}\) degree - first fitting a non-penalized least squares.

Fit 4

The additional degrees of freedom appear to allow for a more natural fit, closer to the true functional form. Now we’ll fit the same model as above but under pls - again \(\lambda\) is arbitrarily chosen to be \(1\).

This is the best fit so far, these hyperparameters appear to allow the function to approximate the actual true function to a reasonable degree.

Fit 5

For good measure, a final fit is performed where the T matrix consists of polynomials up until the \(3^{rd}\) degreed, fit by pls.

The excessive degrees of freedom seem to allow the model to overfit on the particular sample of training data available. This probably approximates the sample very closely & would not generalize well.


Conclusions

The intelligently spaced Reduced Rank Thin Plate Spline - given the correct hyperparameters - appears to have the flexibility to approximate a function well, however is the data (in our case sample of only \(100\) points from \(10000\)) is too small it may overfit. The simplictic 16 evenly spaced knots seems to work well & it is not clear from my experiment if the additional complexity is worth it.

References

  1. Simon N. Wood, 2003. “Thin plate regression splines,” Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(1), pages 95-114, February.